Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

CHAPTER 9 Summarizing and Graphing Your Data 117

from lowest to highest mmHg, you can list them as 84, 84, 89, 91, 110, 114,

and 116. There are seven values, and 91 is the fourth of the seven sorted values, so

that is the median. Three DBPs in the sample are smaller than 91 mmHg, and

three are larger than 91 mmHg. If you have an even number of values, the median

is the average of the two middle values. So imagine that you add a value of 118

mmHg to the top of your list, so you now have eight values. To get the median, you

would make an average of the fourth and fifth value, which would be (91 + 110)/2

= 100.5 mmHg (don’t be thrown off by the 0.5).

Statisticians often say that they prefer the median to the mean because the median

is much less strongly influenced by extreme outliers than the mean. For example,

if the largest value for DBP had been very high — such as 150 mmHg instead of

116 mmHg — the mean would have jumped from 98.3 mmHg up to 103.1 mmHg.

But in the same case, the median would have remained unchanged at 91. Here’s an

even more extreme example: If a multibillionaire were to move into a certain

state, the mean family net worth in that state might rise by hundreds of dollars,

but the median family net worth would probably rise by only a few cents (if it were

to rise at all). This is why you often hear the median rather than mean income in

reports comparing income across regions.

Mode

The mode of a sample of numbers is the most frequently occurring value in the

sample. One way to remember this is to consider that mode means fashion in

French, so the mode is the most popular value in the data set. But the mode has

several issues when it comes to summarizing the centrality of observed values for

continuous numerical variables. Often there are no exact duplicates, so there is no

mode. If there are any exact duplicates, they usually are not in the center of the

data. And if there is more than one value that is duplicated the same number of

times, you will have more than one mode.

So the mode is not a good summary statistic for sampled data. But it’s useful for

characterizing a population distribution, because it’s the value where the peak of

the distribution function occurs. Some distribution functions can have two peaks

(a bimodal distribution), as shown earlier in Figure 9-2d, indicating two distinct

subpopulations, such as the distribution of age of death from influenza in many

populations, where we see a mode in young children, and another mode in older

adults.

Considering some other “means” to

measure central tendency

Several other kinds of means are useful measures of central tendency in certain

circumstances. They’re called means because they all calculated using the same